Storage and Query Processing Optimizations for Hierarchically - Organized Data
نویسندگان
چکیده
Hierarchical data can be found anywhere multiple pieces of information are connected by a relationship. The first chapter of my thesis deals with processing relational queries in the context of a native XML storage system. We take advantage of hierarchical XML to equate the nested structure of the XML documents with the key relationship between two data items. A join query can then be rewritten as a storage scan, allowing acceleration of the query. The second chapter provides a comparison of row and column oriented storage optimizations given an assumption of read-mostly data access. We focus on two, “super tuples” and “column abstraction”, to elucidate the difference between row and column storage layouts. Column abstraction allows optimization of hierarchically organized data by storing repeating values only once. We extend the read-optimized relational store in the third chapter to evaluate two predicate evaluation strategies—a scan-based solution over both standard super tuples and the PAX layout, and an index-based strategy with and without a slot array in super tuples.
منابع مشابه
Processing Star Queries on Hierarchically-Clustered Fact Tables
Star queries are the most prevalent kind of queries in data warehousing, OLAP and business intelligence applications. Thus, there is an imperative need for efficiently processing star queries. To this end, a new class of fact table organizations has emerged that exploits path-based surrogate keys in order to hierarchically cluster the fact table data of a star schema [DRSN98, MRB99, KS01]. In t...
متن کاملRevisiting Database Storage Optimizations on Flash
The database storage hierarchy has been heavily optimized for the performance characteristics of disks. Storage managers typically employ rowor column-oriented storage layouts, or a combination, to improve the I/O performance of different query workloads with disks. The recent rise of flash memory-based solid-state drives (SSDs) significantly change the performance characteristics of storage: t...
متن کاملBitMat – Scalable Indexing and Querying of Large RDF Graphs
The growing size of Semantic Web data expressed in the form of Resource Description Framework (RDF) has made it necessary to develop effective ways of storing this data to save space and to query it in a scalable manner. SPARQL – the query language for RDF data – closely follows SQL syntax. As a natural consequence most of the RDF storage and querying engines are based on modern database storag...
متن کاملQuery Processing in Self-Organized Storage Systems
Storage systems are increasingly approaching their limits regarding system response to node overload and failure as well as overall scalability. Selforganized systems can be a solution to those issues. However, query processing research has not yet evolved to this area. This research proposal aims at extending distributed query processing to self-organized systems. The different components are ...
متن کاملMobile Query Optimization Based on Agent-Technology for Distributed Data Warehouse and OLAP Applications
With the rapid collection of data in wide variety of fields—ranging from business transactions through medical investigations to scientific research—the demands in data analysis tools are ever growing. Today’s challenges are less related to data storage and information retrieval, but can rather be found in the analysis of data on a global scale in a heterogenous information system: technologies...
متن کامل